Search CORE

10 research outputs found

Algorithmic Techniques in Gene Expression Processing. From Imputation to Visualization

Author: Tuikkala Johannes
Publication venue: Turku Centre for Computer Science
Publication date: 20/11/2014
Field of study

The amount of biological data has grown exponentially in recent decades. Modern biotechnologies, such as microarrays and next-generation sequencing, are capable to produce massive amounts of biomedical data in a single experiment. As the amount of the data is rapidly growing there is an urgent need for reliable computational methods for analyzing and visualizing it. This thesis addresses this need by studying how to efficiently and reliably analyze and visualize high-dimensional data, especially that obtained from gene expression microarray experiments. First, we will study the ways to improve the quality of microarray data by replacing (imputing) the missing data entries with the estimated values for these entries. Missing value imputation is a method which is commonly used to make the original incomplete data complete, thus making it easier to be analyzed with statistical and computational methods. Our novel approach was to use curated external biological information as a guide for the missing value imputation. Secondly, we studied the effect of missing value imputation on the downstream data analysis methods like clustering. We compared multiple recent imputation algorithms against 8 publicly available microarray data sets. It was observed that the missing value imputation indeed is a rational way to improve the quality of biological data. The research revealed differences between the clustering results obtained with different imputation methods. On most data sets, the simple and fast k-NN imputation was good enough, but there were also needs for more advanced imputation methods, such as Bayesian Principal Component Algorithm (BPCA). Finally, we studied the visualization of biological network data. Biological interaction networks are examples of the outcome of multiple biological experiments such as using the gene microarray techniques. Such networks are typically very large and highly connected, thus there is a need for fast algorithms for producing visually pleasant layouts. A computationally efficient way to produce layouts of large biological interaction networks was developed. The algorithm uses multilevel optimization within the regular force directed graph layout algorithm.Siirretty Doriast

UTUPub

Missing value imputation improves clustering and interpretation of gene expression microarray data

Author: AG de Brevern
D Wang
G Feten
H Kim
H Kuhn
H Yoshimoto
I Scheel
J Handl
J He
J Hu
J Tuikkala
JJ Wyrick
JL DeRisi
Johannes Tuikkala
Laura L Elo
M Al-Daoud
M Hirao
M Kankainen
M Ronen
M Shapira
MJ Brauer
O Troyanskaya
Olli S Nevalainen
P D'haeseleer
PT Spellman
R Jörnsten
S Oba
S Tavazoie
T Lange
Tero Aittokallio
TR Golub
X Gan
X Wang
Y Shi
Z Cai
Publication venue: BioMed Central
Publication date: 01/04/2008
Field of study

Abstract Background Missing values frequently pose problems in gene expression microarray experiments as they can hinder downstream analysis of the datasets. While several missing value imputation approaches are available to the microarray users and new ones are constantly being developed, there is no general consensus on how to choose between the different methods since their performance seems to vary drastically depending on the dataset being used. Results We show that this discrepancy can mostly be attributed to the way in which imputation methods have traditionally been developed and evaluated. By comparing a number of advanced imputation methods on recent microarray datasets, we show that even when there are marked differences in the measurement-level imputation accuracies across the datasets, these differences become negligible when the methods are evaluated in terms of how well they can reproduce the original gene clusters or their biological interpretations. Regardless of the evaluation approach, however, imputation always gave better results than ignoring missing data points or replacing them with zeros or average values, emphasizing the continued importance of using more advanced imputation methods. Conclusion The results demonstrate that, while missing values are still severely complicating microarray data analysis, their impact on the discovery of biologically meaningful gene groups can – up to a certain degree – be reduced by using readily available and relatively fast imputation methods, such as the Bayesian Principal Components Algorithm (BPCA).</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A multilevel layout algorithm for visualizing physical and genetic interaction networks, with emphasis on their modular organization

Author: Aittokallio Tero
Nevalainen Olli S
Salmela Pekka
Tuikkala Johannes
Vähämaa Heidi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2012
Field of study

Abstract Background Graph drawing is an integral part of many systems biology studies, enabling visual exploration and mining of large-scale biological networks. While a number of layout algorithms are available in popular network analysis platforms, such as Cytoscape, it remains poorly understood how well their solutions reflect the underlying biological processes that give rise to the network connectivity structure. Moreover, visualizations obtained using conventional layout algorithms, such as those based on the force-directed drawing approach, may become uninformative when applied to larger networks with dense or clustered connectivity structure. Methods We implemented a modified layout plug-in, named Multilevel Layout, which applies the conventional layout algorithms within a multilevel optimization framework to better capture the hierarchical modularity of many biological networks. Using a wide variety of real life biological networks, we carried out a systematic evaluation of the method in comparison with other layout algorithms in Cytoscape. Results The multilevel approach provided both biologically relevant and visually pleasant layout solutions in most network types, hence complementing the layout options available in Cytoscape. In particular, it could improve drawing of large-scale networks of yeast genetic interactions and human physical interactions. In more general terms, the biological evaluation framework developed here enables one to assess the layout solutions from any existing or future graph drawing algorithm as well as to optimize their performance for a given network type or structure. Conclusions By making use of the multilevel modular organization when visualizing biological networks, together with the biological evaluation of the layout solutions, one can generate convenient visualizations for many network biology applications.</p

Directory of Open Access Journals

Towards better accuracy for missing value estimation of epistatic miniarray profiling data by a novel ensemble approach

Author: Aittokallio
Bian
Bo
Brock
Cai
Candes
Candes
Collins
Elo
Farhangfar
Farhangfar
Fazel
Fiedler
Friedland
Gu
Ho
Hong-Bin Shen
Jerez
Johannes
Kim
Nanni
Nanni
Nanni
Nanni
Nanni
Nanni
Oba
Polikar
Polikar
Pu
Roguev
Ryan
Schneider
Schuldiner
Tong
Troyanskaya
Tuikkala
Wilmes
Xiao-Yong Pan
Yan Huang
Ye Tian
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

A multilevel layout algorithm for visualizing physical and genetic interaction networks, with emphasis on their modular organization

Crossref

Integrative Analysis of Transcriptomic and Proteomic Data: Challenges, Solutions and Applications

Author: Aebersold R.
Akashi H.
Alter O.
Anderle M.
Anderson L.
Aubert C.
Basler M.
Beck G. R.
Berg O. G.
Berrar D. P.
Beyer A.
Box G. E. P.
Breen E. J.
Bronstrup M
Brotz-Oesterhelt H.
Brown C. M.
Bø T. H.
Chen G.
Chen G.
Collins R. F.
Conrads K. A.
Cox B.
David E. Culley
Dethlefsen L.
Durbin B. P.
Faxen M.
Freiberg C.
Gang Wu
Gao J.
Ghaemmaghami S.
Gowrishankar J.
Greenbaum D.
Greenbaum D.
Griffin T. J.
Hack C. J
Hegde P. S.
Heidelberg J. F.
Horak C. E.
Huber W.
Ideker T.
Johannes C. M. Scholten
Jung K.
Kane J. F
Khodursky A. B.
Kim H.
Kleinbaum D. G.
Labbe A.
Lee J. H.
Lee T. I.
Lei Nie
Lichtinghagen R.
Lithwick G.
MacKay V. L.
Maziarz M.
McCarthy J. E. G.
McCullagh P.
McLachlan G. J.
Mehra A.
Mehra A.
Mootha V. K.
Mootha V. K.
Munoz E. T.
Nie L.
Nie L.
Nie L.
Orntoft T. F.
Poole E. S.
Purohit P. V.
Resch A.
Resing K. A.
Rhodius V. A.
Rocha E. P.
Romby P.
Scherl A.
Scherl A.
Selinger D. W.
Shimizu T.
Shine J.
Sorensen M. A.
Spellman P. T.
Stenstrom C. M.
Tjaden B.
Troyanskaya O.
Tuikkala J.
Vellanoweth R. L.
Wang D.
Washburn M. P.
Weiwen Zhang
Wilkins M. R.
Yu X. L.
Zhang W.
Zhang W.
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref